Learning Objectives: In this tutorial, you will learn how to use the DemoKin package to analyze family structure and kinship networks, understand the mechanics of time-invariant models, and visualize changes in kinship relations across the life course.

1 Introduction

Kinship is a fundamental property of human populations and a key form of social structure. Demographers have long been interested in the interplay between demographic change and family configuration. This has led to the development of sophisticated methodological and conceptual approaches for the study of kinship, some of which are explored in this tutorial.

Kinship analysis can answer a range of important questions:

  • How does family size change over an individual’s life course?
  • How does family structure evolve as populations undergo demographic transition?
  • How many relatives might people have at different ages, and what is the age distribution of these relatives?

In this tutorial, we will implement matrix kinship models using the DemoKin package to calculate kin counts and age distributions. We begin with the simplest model: a time-invariant one-sex model. In this model, we assume that everyone in the population experiences the same mortality and fertility rates throughout their lives (e.g., the 2015 rates), and we only trace female kin relationships.

1.1 Preparation

Before starting the workshop, please ensure you complete the following preparatory steps:

  1. If you haven’t already, install R and RStudio. This is a useful tutorial: https://rstudio-education.github.io/hopr/starting.html
  2. Install the following packages in R:
# Install basic data analysis packages
install.packages("dplyr")     # Data manipulation
install.packages("tidyr")     # Data tidying
install.packages("ggplot2")   # Data visualization
install.packages("readr")     # Data import
install.packages("knitr")     # Document generation
install.packages("data.table")# Efficient data handling
install.packages("Matrix")    # Matrix operations

# Install DemoKin
# DemoKin is available on CRAN (https://cran.r-project.org/web/packages/DemoKin/index.html), 
# but we'll use the development version on GitHub (https://github.com/IvanWilli/DemoKin):
install.packages("remotes")
remotes::install_github("IvanWilli/DemoKin")

2 Setting Up the Analysis Environment

Let’s begin by loading the necessary packages for our analysis:

library(dplyr)    # For data manipulation
library(tidyr)    # For restructuring data
library(ggplot2)  # For visualization
library(readr)    # For reading data
library(knitr)    # For document generation
library(DemoKin)  # For kinship analysis

Load additional utility functions that we’ve prepared for this tutorial:

source("functions.R")

3 Understanding the Demographic Data

3.1 Data Overview

The DemoKin package includes Swedish demographic data from the Human Mortality Database (HMD) and Human Fertility Database (HFD) as an example dataset. This includes:

  • swe_px: Age-by-year matrix of survival probabilities
  • swe_Sx: Age-by-year matrix of survival ratios
  • swe_asfr: Age-by-year matrix of fertility rates
  • swe_pop: Age-by-year matrix of population counts

You can view all available data in the package with data(package="DemoKin").

3.2 Exploring the Data

Let’s examine a subset of the Swedish demographic data to understand its structure:

# First 10 rows and columns of survival probabilities
head(swe_px[1:10, 1:10])
##      1900    1901    1902    1903    1904    1905    1906    1907    1908
## 0 0.91060 0.90673 0.92298 0.91890 0.92357 0.92094 0.92717 0.93134 0.92217
## 1 0.97225 0.97293 0.97528 0.97549 0.97847 0.97844 0.98066 0.98175 0.97928
## 2 0.98525 0.98579 0.98630 0.98835 0.98921 0.98914 0.99050 0.99149 0.99135
## 3 0.98998 0.98947 0.99079 0.99125 0.99226 0.99112 0.99341 0.99351 0.99383
## 4 0.99158 0.99133 0.99231 0.99352 0.99272 0.99300 0.99392 0.99539 0.99526
## 5 0.99310 0.99253 0.99401 0.99388 0.99468 0.99394 0.99542 0.99587 0.99570
##      1909
## 0 0.93524
## 1 0.98415
## 2 0.99200
## 3 0.99429
## 4 0.99560
## 5 0.99624
# Fertility rates for ages 20-30
head(swe_asfr[20:30, 1:10])
##       1900    1901    1902    1903    1904    1905    1906    1907    1908
## 19 0.04409 0.04357 0.04742 0.04380 0.04523 0.04415 0.04779 0.04910 0.05205
## 20 0.06776 0.07122 0.06989 0.06792 0.06952 0.06981 0.07187 0.07211 0.07994
## 21 0.09643 0.09931 0.09613 0.09654 0.09546 0.09437 0.09761 0.10108 0.10547
## 22 0.12512 0.12555 0.12526 0.11899 0.12269 0.11923 0.12264 0.12384 0.12738
## 23 0.14631 0.14792 0.14743 0.14237 0.14304 0.14502 0.14433 0.14440 0.14694
## 24 0.16285 0.16847 0.16455 0.16279 0.15931 0.15960 0.16276 0.16271 0.16524
##       1909
## 19 0.05274
## 20 0.07930
## 21 0.10456
## 22 0.12639
## 23 0.14607
## 24 0.16087

For our time-invariant model, we need to extract the demographic rates for a single year. Let’s use 2015 as our reference year:

# Extract vectors for 2015
swe_surv_2015 <- swe_px[,"2015"]  # Survival probabilities
swe_asfr_2015 <- swe_asfr[,"2015"] # Fertility rates

Let’s compare the data between different time periods to understand demographic changes. Here we compare values from 1950 and 2000:

# Survival probabilities
cat("Survival probabilities (px):\n")
## Survival probabilities (px):
head(swe_px[,c("1950","2000")])
##      1950    2000
## 0 0.98237 0.99717
## 1 0.99833 0.99984
## 2 0.99885 0.99986
## 3 0.99904 0.99996
## 4 0.99938 0.99988
## 5 0.99920 0.99992
# Fertility rates
cat("\nFertility rates (asfr):\n")
## 
## Fertility rates (asfr):
head(swe_asfr[,c("1950","2000")])
##   1950 2000
## 0    0    0
## 1    0    0
## 2    0    0
## 3    0    0
## 4    0    0
## 5    0    0
# Population counts
cat("\nPopulation counts:\n")
## 
## Population counts:
head(swe_pop[,c("1950","2000")])
##    1950  2000
## 0 57780 43058
## 1 60451 43599
## 2 61288 44356
## 3 62970 46880
## 4 63089 50383
## 5 62963 55150

4 The DemoKin Package

4.1 Overview

DemoKin is an R package designed to compute the number and age distribution of relatives (kin) of a focal individual under various demographic assumptions. It can analyze both living and deceased kin, and allows for both time-invariant and time-varying demographic rates.

4.2 The kin() Function

The main function in the package is DemoKin::kin(), which implements matrix kinship models to calculate expected kin counts.

For our first example, we’ll run the simplest model with the following assumptions:

  1. Time-invariant rates: The same set of mortality and fertility rates apply throughout all time periods (we’ll use 2015 rates).
  2. One-sex population: We’ll only use female data and trace kinship through female lines.

Let’s run the basic kinship model:

# Run the time-invariant, one-sex model
swe_2015 <- kin(
  p = swe_surv_2015,          # Vector of survival probabilities
  f = swe_asfr_2015,          # Vector of fertility rates
  time_invariant = TRUE       # Use time-invariant model
)

4.3 Function Arguments

The kin() function accepts several important arguments:

  • p: A vector or matrix of survival probabilities with rows as ages (and columns as years if a matrix)
  • f: A vector or matrix of fertility rates with the same dimensions as p
  • time_invariant: Logical flag indicating whether to assume time-invariant rates (default: TRUE)
  • output_kin: Character vector specifying which kin types to return (e.g., “m” for mother, “d” for daughter)

4.4 Relative Types

In DemoKin, each type of relative is identified by a unique code. These codes differ from those used in Caswell (2019). The following table shows the relationship between these coding systems:

# Display relationship codes
demokin_codes

4.5 Function Output

The kin() function returns a list containing two data frames:

# Examine the structure of the output
str(swe_2015)
## List of 2
##  $ kin_full   : tibble [142,814 × 7] (S3: tbl_df/tbl/data.frame)
##   ..$ kin      : chr [1:142814] "d" "d" "d" "d" ...
##   ..$ age_kin  : int [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ age_focal: int [1:142814] 0 1 2 3 4 5 6 7 8 9 ...
##   ..$ living   : num [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ dead     : num [1:142814] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ cohort   : logi [1:142814] NA NA NA NA NA NA ...
##   ..$ year     : logi [1:142814] NA NA NA NA NA NA ...
##  $ kin_summary: tibble [1,414 × 10] (S3: tbl_df/tbl/data.frame)
##   ..$ age_focal     : int [1:1414] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ kin           : chr [1:1414] "coa" "cya" "d" "gd" ...
##   ..$ year          : logi [1:1414] NA NA NA NA NA NA ...
##   ..$ cohort        : logi [1:1414] NA NA NA NA NA NA ...
##   ..$ count_living  : num [1:1414] 0.2752 0.0898 0 0 0 ...
##   ..$ mean_age      : num [1:1414] 8.32 4.05 NaN NaN NaN ...
##   ..$ sd_age        : num [1:1414] 6.14 3.68 NaN NaN NaN ...
##   ..$ count_dead    : num [1:1414] 0.0000633 0.000037 0 0 0 ...
##   ..$ count_cum_dead: num [1:1414] 0.0000633 0.000037 0 0 0 ...
##   ..$ mean_age_lost : num [1:1414] 0 0 NaN NaN NaN 0 0 0 0 NaN ...

4.5.1 The kin_full Data Frame

This data frame contains detailed information on expected kin counts by: - Age of the focal individual - Type of kin - Age of kin - Living/dead status

# View the first few rows of kin_full
head(swe_2015$kin_full)

4.5.2 The kin_summary Data Frame

This data frame provides a summary of expected kin counts by: - Age of the focal individual - Type of kin - Total counts (not broken down by age of kin)

# View the first few rows of kin_summary
head(swe_2015$kin_summary)

5 Visualizing Kinship Networks

5.1 Keyfitz Diagrams

One powerful way to visualize kinship structure is through a network or ‘Keyfitz’ kinship diagram (Keyfitz, Caswell, et al. 2005). Let’s see the expected number of living female relatives for a 65-year-old woman according to our model:

swe_2015$kin_summary %>% 
  filter(age_focal == 65) %>% 
  select(kin, count = count_living) %>% 
  plot_diagram(rounding = 2)

Interpretation: This Keyfitz diagram provides a comprehensive view of the kinship network for a 65-year-old woman in Sweden (based on 2015 demographic rates). The diagram shows:

  • Vertical relationships: A 65-year-old woman is likely to have around 1.85 daughters and 1.73 granddaughters, but few great-granddaughters (0.17) as they wouldn’t have been born yet. Looking upward, she’s unlikely to have a living mother (0.12) and almost certainly no living grandmother.
  • Horizontal relationships: She would have about 0.93 living sisters and 1.74 nieces.

This visualization helps us understand the changing composition of family networks across the life course.

6 Analyzing Living Kin Over the Life Course

Let’s run the model again, but this time we’ll specify exactly which kin types we want to analyze:

swe_2015 <- 
  kin(
    p = swe_surv_2015,
    f = swe_asfr_2015,
    output_kin = c("c", "d", "gd", "ggd", "gm", "m", "n", "a", "s"),  # Specific kin types
    time_invariant = TRUE
  )

Now, let’s visualize how the expected number of each type of relative changes over the life course:

swe_2015$kin_summary %>%
  rename_kin() %>%  # Convert kin codes to readable labels
  ggplot() +
  geom_line(aes(age_focal, count_living), linewidth = 1)  +
  theme_bw() +
  labs(
    title = "Expected number of living female relatives over the life course",
    subtitle = "Based on Swedish demographic rates from 2015",
    x = "Age of focal individual",
    y = "Number of living female relatives"
  ) +
  facet_wrap(~kin_label, scales = "free_y")  # Use different y-scales for each panel

Interpretation: These plots show how different kinship relationships evolve over a person’s lifetime:

  • Mothers: Initially 1.0 (everyone has a mother at birth), then gradually declining as mortality takes its toll
  • Grandmothers: Start lower (many already deceased at Focal’s birth) and decline rapidly
  • Daughters: Increasing during reproductive years, then stable
  • Granddaughters: Appearing later and increasing as daughters have children
  • Great-granddaughters: Appearing even later as granddaughters have children
  • Sisters: Relatively stable then declining due to mortality
  • Aunts and cousins: Follow similar patterns of eventual decline

Note that we are working in a time-invariant framework. You can think of the results as analogous to life expectancy (i.e., expected years of life for a synthetic cohort experiencing a given set of period mortality rates).

6.1 Total Family Size Over the Life Course

How does the overall family size (and family composition) vary over life for an average woman?

# Calculate total kin count at each age
counts <- 
  swe_2015$kin_summary %>%
  group_by(age_focal) %>% 
  summarise(count_living = sum(count_living)) %>% 
  ungroup()

# Plot family composition over the life course
swe_2015$kin_summary %>%
  select(age_focal, kin, count_living) %>% 
  rename_kin() %>% 
  ggplot(aes(x = age_focal, y = count_living)) +
  geom_area(aes(fill = kin_label), color = "black", alpha = 0.8) +
  geom_line(data = counts, linewidth = 1.5) +
  labs(
    title = "Family size and composition over the life course",
    subtitle = "Based on Swedish demographic rates from 2015",
    x = "Age of focal individual",
    y = "Number of living female relatives",
    fill = "Kin type"
  ) +
  theme_bw() +
  theme(legend.position = "bottom")

Interpretation: This stacked area chart reveals fascinating patterns in family size and composition throughout life:

  1. Early life: Family consists primarily of mothers, grandmothers, aunts, and sisters
  2. Young adulthood (20s-30s): Total family size increases as daughters are born
  3. Middle age (40s-50s): Another increase as granddaughters arrive, while older relatives (mothers, grandmothers) begin to disappear
  4. Older age (60s+): Family composition shifts dramatically toward descendants (daughters, granddaughters, great-granddaughters)

The total family size (black line) shows an interesting U-shape, first declining as older relatives die, then rising again as new generations are born.

7 Age Distribution of Relatives

Beyond just counting relatives, we’re often interested in their age distribution. Using the kin_full data frame, we can examine the age distribution of Focal’s relatives at a specific age.

Let’s visualize the age distribution of relatives when Focal is 65 years old:

swe_2015$kin_full %>%
  rename_kin() %>%
  filter(age_focal == 65) %>%
  ggplot(aes(age_kin, living)) +
  geom_line(linewidth = 1) +
  geom_vline(xintercept = 65, color = "red", linetype = "dashed") +
  labs(
    title = "Age distribution of living female relatives when Focal is 65",
    subtitle = "Based on Swedish demographic rates from 2015 (red line = Focal's age)",
    x = "Age of relative",
    y = "Expected number of living relatives"
  ) +
  theme_bw() +
  facet_wrap(~kin_label, scales = "free_y")

Interpretation: These distributions provide rich information about family age structure:

  • Mothers: If still alive, would be concentrated around age 85-95
  • Daughters: Mostly in their 30s and 40s
  • Granddaughters: Predominantly young, between ages 5-20
  • Sisters: Close to Focal’s own age (65)
  • Nieces: Mostly in their 40s and 50s
  • Cousins: Similar in age to Focal

Understanding age distributions is crucial for estimating care needs, support systems, and intergenerational transfers within families.

8 Conclusion

In this tutorial, we’ve explored how to use the DemoKin package to model kinship dynamics in a time-invariant, one-sex framework. We’ve seen how different demographic patterns affect family size and composition, and visualized these relationships across the life course.

Key insights include:

  1. Family networks are dynamic, changing dramatically throughout the life course
  2. Both family size and composition evolve with age
  3. Modern demographic rates lead to “bean pole” families—vertical extension (multiple generations) but horizontal contraction (fewer siblings, cousins)
  4. Matrix population models provide a powerful framework for understanding these dynamics

In real-world applications, these models can inform: - Planning for eldercare needs - Understanding support systems for young families - Estimating intergenerational wealth transfers - Forecasting demographic dependency ratios

Advanced extensions to this model could include: - Two-sex models (tracking both male and female relatives) - Time-varying models (accounting for historical demographic change) - Stochastic models (incorporating uncertainty)

References

Caswell, Hal. 2019. “The Formal Demography of Kinship: A Matrix Formulation.” Demographic Research 41 (September): 679–712. https://doi.org/10.4054/DemRes.2019.41.24.
Keyfitz, Nathan, Hal Caswell, et al. 2005. Applied Mathematical Demography. Vol. 47. Springer.